Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster With 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode
نویسندگان
چکیده
Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel on resource-constrained and battery-powered devices while retaining the flexibility granted by instruction processor-based architectures poses several challenges related to memory footprint, computational throughput, energy efficiency. Low-bitwidth mixed-precision arithmetic have been proven be valid strategies tackling these problems. We present Dustin, a fully programmable compute cluster integrating 16 RISC-V cores capable of 2- 32-bit all possible combinations. In addition conventional Multiple-Instruction Multiple-Data (MIMD) processing paradigm, Dustin introduces Vector Lockstep Execution Mode (VLEM) minimize power consumption in highly kernels. VLEM, single leader core fetches instructions broadcasts them 15 follower cores. Clock gating Instruction Fetch (IF) stages private caches leads 38% reduction. The cluster, implemented 65 nm CMOS technology, achieves peak performance 58 GOPS efficiency 1.15 TOPS/W.
منابع مشابه
An Ultra High CMRR Low Voltage Low Power Fully Differential Current Operational Amplifier (COA)
this paper presents a novel fully differential (FD) ultra high common mode rejection ratio (CMRR) current operational amplifier (COA) with very low input impedance. Its FD structure that attenuates common mode signals over all stages grants ultra high CMRR and power supply rejection ratio (PSRR) that makes it suitable for mixed mode and accurate applications. Its performance is verified by HSPI...
متن کاملa 10-bit 50-ms/s parallel successive-approximation analog-to-digital converter
applications such as high definition viedeo reproduction, portable computers, wireless, and multimedia demand, and ever-increasing need for ligh-frequency high-resolution and low-power analog-to-digital converters. flash, two-step flash, and pipeline convertors are fast but consume large amount of power and require large area. to overcome these problems, successive approximation converter blo...
15 صفحه اولUltra Low Power 1-Bit Full Adder
In this paper we propose a new 9 transistor 1-bit full adder. The proposed circuit performs efficiently in subthreshold region to employ in ultra low power applications. The main design objective for this new circuit is low power consumption and full voltage swing at a low supply voltage. The proposed cell also remarkably improves the power consumption, power delay product and has better noise ...
متن کاملPULP: A Ultra-Low Power Parallel Accelerator for Energy-Efficient and Flexible Embedded Vision
Novel pervasive devices such as smart surveillance cameras and autonomous micro-UAVs could greatly benefit from the availability of a computing device supporting embedded computer vision at a very low power budget. To this end, we propose PULP (Parallel processing Ultra-Low Power platform), an architecture built on clusters of tightly-coupled OpenRISC ISA cores, with advanced techniques for fas...
متن کاملAn Ultra-Low-Power 75mV 64-Bit Current-Mode Majority-Function Adder
Ultra-low-power circuits are becoming more desirable due to growing portable device markets and they are also becoming more interesting and applicable today in biomedical, pharmacy and sensor networking applications because of the nano-metric scaling and CMOS reliability improvements. In this thesis, three main achievements are presented in ultra-low-power adders. First, a new majority function...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Circuits and Systems I-regular Papers
سال: 2023
ISSN: ['1549-8328', '1558-0806']
DOI: https://doi.org/10.1109/tcsi.2023.3254810